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Abstract 

We propose a new class of models for random permutations, which we call log-linear models, 
by the analogy with log-linear models used in the analysis of contingency tables. As a special 
case, we study the family of all Luce-decomposable distributions, and the family of those 
random permutations, for which the distribution of both the permutation and its inverse 
is Luce-decomposable. We show that these latter models can be described by conditional 
independence relations. We calculate the number of free parameters in these models, and 
describe an iterative algorithm for maximum likelihood estimation, which enables us to test if 
a set of data satisfies the conditional independence relations or not. 
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1 Introduction 

There are three shghtly different situations, in which permutation-valued data may turn 
up. 

(i) The permutation describes a pairing, i.e. a one to one correspondance, between two 
sets of cardinahty n, whose elements are labelled with the numbers 1, . . . ,n. An example 
is a pairing of boys and girls for a dance. If the first set is the boys and the second set is 
the girls, then a pairing is given by the permutation tt, where 7r(i) = j means that boy i 
dances with girl j. 

{a) The permutation describes a ranking of a labelled set of cardinality n, i.e. the 
ranking from best to worst of labelled alternatives. The ranking is given by tt, where 
7r(i) = j means that alternative i is ranked jth best. The inverse of the ranking vr is the 
ordering vr"^, i.e. vr~^(i) is the label of the alternative ranked ith best. 

{Hi) The permutation describes a reordering of a set of ordered elements. For example, 
books on a shelf in a library are reordered as readers look into them. Here 7r(z) = j means 
that the ith item in the original order becomes the jth item after reordering. 
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Notice that a pairing or a ranking/ordering can be described by a permutation only 
after a labelling is fixed on the sets. 

There is a vast literature of models for random permutations, especially for rank- 
ing/ordering data (a). Comprehensive summaries can be found, among others, in Critch- 
low, Fligner, and Verducci and Marden [T7|. The larger classes of models are order 
statistics models (also called Thurstonian models), distance-based models, paired compar- 
ison models, and multistage models. 

Our starting point is the concept of Luce-decomposability, also called L-decomposability 
(we will use the latter name). This property, applied to orderings, was introduced by 
Critchlow, Fligner and Verducci in j4], motivated by Luce's ranking postulate [15]. This 
postulate supposes that the ordering of the alternatives is the result of repeated selections 
of the best alternative from the remaining set of alternatives. That is, for each set C and 
each alternative x & C, the probability that x is chosen as best from C is given by pcix)- 
Given these choice probabilities, the probability of the ordering vr is given by 

n 
k=l 

where Ck = {7r(fc), . . . , 7r(n)} is the set of available alternatives at the kth step. Luce 
combined this postulate with his choice axiom to develop the Luce model. The choice 
axiom puts restrictions on the choice probabilities pc{x). The ranking postulate without 
the choice axiom produces a multistage model, a general L-decomposable distribution. In 
other words, a random ordering (or its distribution) is L-decomposable, if its elements 
are chosen successively, satisfying the following Markov property: in the kth step, the fcth 
element is chosen from the remaining ones, independently of the order of the first k — 1 
elements. 

In this paper, we wish to apply L-decomposability to random pairings, rankings, and 
reorderings as well. We notice that L-decomposability of a random pairing depends on 
the labelling of the first set, and L-decomposability of a random ranking depends on 
the labelling of the alternatives. In fact, the labelling under which L-decomposability is 
satisfied (if such a labelling exists) can be interpreted as a natural order of the elements 
or alternatives. If n is relatively small, this natural order may be found by an exhaustive 
search over all labellings. 

In the main part of the paper, we study random permutations 11, for which the distribu- 
tions of both n and are L-decomposable. In this case we say that the distribution of 
n (and of n~^) is bi-decomposable. Bi-decomposability of a random pairing implies a nat- 
ural order on both sets, while bi-decomposability of a random ranking /ordering implies a 
natural order on the set of alternatives. This new concept is perhaps most natural for ran- 
dom pairings, since they possess an obvious symmetry in the two sets whose elements are 
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paired, that is in 11 and n~ . The model is also attractive in the case of rankings/orderings, 
since, as we shall show, a general L-decomposable distribution has 2"(n/2 — 1) + 1 free 
parameters, while a general bi-decomposable distribution possesses only X^^Zj fc^ param- 
eters. This is still more than the usual number of about n or at most r? for most known 
models, however, it is still very small compared to n\. 

Another feature of both the L-decomposable and the bi-decomposable families is that 
they can be characterised by certain conditional independence relations, which we will 
describe later in detail. Therefore, by fitting these models with the maximum likelihood 
method, and assessing the goodness of fit with the chi-square test, we can test the hypoth- 
esis that the random permutation satisfies certain conditional independence properties. 

The paper is organised as follows. Section 2 deals with L-decomposability, and Section 
3 contains the main results about bi-decomposable distributions. In Section 3.1, we study 
general log-linear models for random permutations, which we apply in Section 3.2 to de- 
composable models, and prove Theorem [H Section 3.3 treats the problem of maximum 
likelihood estimation in the models. It is shown that the maximum likelihood estimate is 
explicite in the L-decomposable model, but in the bi-decomposable model, it can only be 
obtained by iterative methods. In Section 4, we investigate which models formulated in the 
literature are L-decomposable or bi-decomposable. In Section 5 we study to what extent 
the latent order with respect to which the distribution is decomposable can be estimated. 
Finally, in Section 6, we fit the models to a real dataset, and Section 7 contains the proofs 
of some lemmas. 



2 L-decomposability 

For integers i < j, {i : j} denotes the set {A: : i < k < j}. For any vector v = 
. . . ,f (s)), we call v{i) the ith element of v. For the set of the ith to jth elements, 
and for the subvector of the ith to jth elements of v, introduce the notations 

v{i : j} = {v{i),. . . ,v{j)}, v{i : j) = {v{i),... ,v{j)), l<i<j<s. (1) 

If j < i then let v{i : j} be the empty set. Let 5„ stand for the symmetric group of all 
permutations vr of {1, . . . ,n}. We denote a probability distribution on Sn hy p = {p(vr) : 
vr G Sn}- Denote by 11 : Q — > S'n a random permutation on a probability space {Q,A,P) 
with distribution p, that is P{Il = vr) = p{n). The idea of L-decomposability first appears 
in [4], and was motivated by Luce's ranking postulate |T5]. It states that for any k, the 
value of n(fc -|- 1) depends on 11(1 : k) only through Xljl : k}. Recall that the probability 
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of a permutation can always be written in the product form 

n-l 



P(n = tt) = P (n(A: + 1) = Tr{k + 1) | n(l : k) = 7r(l : k)) . (2) 
k=0 

L-decomposability means that the conditions 11(1 : k) = tt{1 : k) can be replaced by the 
conditions n{l : A;} = 7r{l : A;}. We formulate this in the following definition, in four 
different forms. For two permutations, na denotes composition, i.e. 7ra{i) = 7r((7(i)). 

Definition 1. Let 11 be a random permutation with probability distribution p on Sn- H 
or p is called L-decomposable, if any of the following are satisfied. 

1. For every 2<A;<n — 2, vrG^n and a £ Sk 

P {U{k + 1) = 7r(fc + 1) I n(l : k) = 7r(l : k)) = 

= P {U{k + 1) = 7T{k + 1) I n(l : k) = 7ra{l : k)) , (3) 

if both conditional probabilities are defined. 

2. For every 2 < k < n — 2 and tt £ Sn 

P {U{k + 1) = 7r(/c + 1) I n(l : k) = 7r(l : k)) = 

= P {U{k + 1) = TT{k + 1) I n{l : k} = tt{1 : k}) , (4) 

if the lefthandside is defined. 

3. The random sets n{l : /c} form a Markov chain for A; = 1, . . . , n. 

4. There exists a A nonnegative function defined on pairs 

{x,C): Cc{l,...,n},x0C, (5) 
and c constant, such that for all n £ Sn 

p(7r) =cj| A(7r(A; + l),7r{l : A:}). (6) 
fc=0 

Proposition 1. The four properties in DefinitionUl are equivalent. 

This proposition, in slightly different form, can be found in @], so we omit the straight- 
forward proof. In the first two equivalent forms of the definition, we could formally include 
A; = 0, l,n — 1 as well, but equations JS]) and Jl]) are always satisfied for these A:- values. It 
follows that for n < 3, all distributions are -L-decomposable. 
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The pair (A, c) is called an L-decomposition of the distribution p if ((ll) holds. By ([2]) 
and dll , one L-decomposition of the L-decomposable distribution p is given by c = 1 and 

A(x, C) = PmC\ + 1) = X I n{l : = C), (7) 

if the probability of the condition is positive, otherwise A(x, C) = 0. We call this L- 
decomposition canonical. 

The fact that the random sets n{l : k} form a Markov chain is equivalent to the 
independence of the past and the future, conditional on the present. This means that the 
first k and last n — k elements of 11 are conditionally independent on the condition that 
the set of the first k elements is given. By the well-known property of Markov chains, this 
observation generalises to any consecutive partition of the set {1, . . . , n}. 

Any j-tuple k = . . . , Kj) with kq = < ki < . . . < kj < n = kj^i is a set of 
sections, which define a k consecutive partition of the set {1, . . . , n} into j + I sets by 

K = {ki, . . . ,Kj+i), where Kj = {ki-i + 1 : Ki}. (8) 

For the consecutive partition k and n E Sn, define the vector of unordered marginals 

{tTk} = ({tTk.} : 1 < i < J + 1), where {tTk.} = 7r{Kj_i + 1 : Ki}. (9) 

In contrast, vr^^ = 7r(/tj_i + 1 : Ki) is an ordered marginal. 

L-decomposability means that for any consecutive partition k in ([8]) and any vr S S'n, 

i+i 

P(n = vr I {n«} = {tt^}) = II P(n«^ = vr,^ I {n,} = {vr,}). (10) 

i=l 

Thus we have proved the following 

Proposition 2. A random permutation IT is L-decomposable if and only if, for every 
consecutive partition in the ordered marginals 11^., 1 < i < j + 1 are conditionally 
independent, given {11^}, that is, (llOp holds. 

In the language of orderings, the unordered marginal {tTk} is the partial ordering of 
shape K derived from the ordering it. This gives the set of alternatives receiving the 
first Ki ranks, the set of alternatives receiving the next K2 — ki ranks, etc. Thus L- 
decomposability for orderings means that given the partial ordering of arbitrary shape k, 
the orderings within each set of ranks Ki are independent. 

3 Bi-decomposability 

We are interested in random permutations, for which the distribution of both 11 and H"^ 
is L-decomposable. If the distribution of IT is p, then the distribution of is given by 
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p'(7r) = p(7r ^). Thus, 11 ^ is L-decomposable if and only if 



(11) 




where A' is the canonical L-decomposition of p' , and the matrix M^' is derived from the 
matrix Ml by interchanging each pair of rows corresponding to inverse permutations tt 
and TT~^. We call such distributions -L'-decomposable, and denote their family by Vl'- 
The family of bi-decomposable distributions will be denoted by Vb = 'Pl H Vl' ■ 

According to Proposition [21 bi-decomposable random permutations have the property 
that for every consecutive partition k in jS]) , the ordered marginals . , 1 < i < J + 1 
are conditionally independent, given {11^}, and the ordered marginals n~^, I < i < j + 1 
are conditionally independent, given {n~^}. We now show that bi-decomposable random 
permutations satisfy additional conditional independence statements. Let k and A be two 
consecutive partitions. For tt £ Sn, define the ordered marginals 



We prove the next proposition in Section [7l 

Proposition 3. A random permutation H is bi-decomposable, if and only if for all pairs 
of consecutive partitions k, and A of sizes s and t respectively, the ordered marginals ^k.xX. 
for 1 < i < s and 1 < j < t are conditionally independent, given {11^} and {11^^}. 

In the rest of the paper, we focus our attention on strictly positive distributions, i.e. 
the case when ^(vr) > for all tt G Sn, which can be described as exponential families, 
or more specifically, as log-linear models. In general, by an exponential family of discrete 
(strictly positive) distributions p = (pi, . . . ,ps) we mean the family 



where C/ is a i-dimensional linear subspace of W , containing the vector 1 = (1, . . . , 1)^. 
The number of free parameters of the exponential family pSjl is t — 1. Extending the 
concept used in the analysis of contingency tables, we may define a log-linear model as an 
exponential family where the linear subspace U has a generating set consisting of — 1 
vectors. All log-linear models for random permutations appearing in this paper have the 
additional property that the generating — 1 vectors of U are indicator vectors of different 
values of generalised marginals {ttbI- 

Denote by and V^, the family of strictly positive L-decomposable and L'-decomposable 
distributions respectively. Then the family of strictly positive bi-decomposable distribu- 
tions is given by = n V^, . We will show that both and V^, admit a log-linear 



^£>xA, = {("'■^(a)) : Ki, n{a) G A^}. 



(12) 




(13) 
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representation with corresponding linear subspaces F and G respectively. It follows that 
is also an exponential family with subspace H = F HG. We will show that is also 
a log-linear model, and we will determine the dimension and a basis of H (Theorem [1]). 
This is made possible by the abundance of orthogonality. Two subspaces U and F of a 
Hilbert space are called orthogonal, if every pair of vectors u G U,v G V are orthogonal. 
The closed subspaces U and V intersect each other orthogonally, if the (orthogonal) pro- 
jection of U on V equals U CiV, or equivalently, the projection of F on [/ equals U CiV. 
Denote the operator of orthogonal projection on U by Prij. Another equivalent condition 
for orthogonal intersection is that the projection operators Pru and Pry commute. Thus 
introducing the notation _Ln for orthogonal intersection, 

U ±nV PruV = UnV PryU = Ur\V PruPry = PryPru- (14) 

We will show that F and G intersect each other orthogonally, furthermore, we will find 
an orthogonal decomposition of both F and G into lower dimensional subspaces and 
Gi, such that each pair of subspaces {Fk,G£) intersect each other orthogonally. Then it 
will suffice to determine the dimension and basis of the low dimensional subspaces F^f^G^. 
Orthogonal intersection does not appear by coincidence, it is the consequence of conditional 
independence relations, as we will explain later on. Before carrying out this program in 
the following subsections, we state the main theorem of this section. 

Theorem 1. The family of positive bi-decomposable distributions is a log-linear model with 
the number of free parameters equal to 

n-l 

dn = ^i^. (15) 

1=1 

The proof of Theorem [1] is given in Section 3.2. 
3.1 Partitions of the chessboard 

A permutation may be identified with a placement of n rooks on the nxn chessboard such 
that they cannot capture each other, i.e. a placement with exactly one rook in each row 
and in each column. Let us agree that we place the rooks "row- wise", that is if 7r(i) = j, 
then we place a rook in the jth square of the ith row. vr"^ can be read "column-wise" 
from the rook-placement. This identification is helpful in the study of bi-decomposability, 
because bi-decomposability is a symmetric property in vr and vr"^, that is in rows and 
columns of the chessboard. 

In this section, we define and study log-linear models for random permutations, whose 
generators are partitions of the chessboard. More specifically, we require that these par- 
titions be the product of a row-partition and a column-partition. A partition of the set 
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{1 : n} into s disjoint subsets (also called atoms) is given by 

Z = {Zi,...,Zs): UUZi = {1 : n}, n Z,- = Vi / j. (16) 

If none of the sets Zi is empty, we call s the size of the partition. If of two such partitions, 
one partitions the set of rows, the other the set of columns of the n x n chessboard, then 
the result is a product partition of the board. 

Definition 2. A partition B of the n x n chessboard is a product partition, if there exist 
a partition TZ of size r (called row-partition) and a partition C of size c (called column- 
partition) of the set {1 : n} such that 

B = {Bij) : Bij = RiX Cj = {{x,y) : x e Ri,y e Cj}, 1 < i < r, 1 < j < c. (17) 

We denote this by H = 7^ x C. 

For any product partition B of the n x n chessboard, we define the matrix-valued B- 
marginal function tt t-^ [ttbI on 5„. For a permutation tt, this statistic gives the number of 
rooks falling into each Bij in the rook-placement corresponding to tt: 

|7rs| = (%), Uj = \{l<s<n:{s,Tr{s))eBij}\. (18) 

In other words, if tt is a pairing between two labelled sets A and B, then IttbI is the r x c 
matrix whose ijth entry is the number of elements of A belonging to Ri, which are paired 
with an element of B belonging to Cj. The partition B of the chessboard gives rise to a 
partition of Sn via the fuction tt ^ \ttb\ '■ the permutations tt and a belong to the same 
atom of this partition if and only if |7re| = \(Jb\- The subspace of M"' spanned by the 
indicator vectors of these atoms will be denoted by U^. Equivalently 

= {ve : IttbI = |c7b| v{Tr) = v{a)}. (19) 

The vectors v G are just the functions tt h- > v{it) on Sn, which are measurable with 
respect to the (atomic) cr-algebra with atoms {tt : |7re| = (Uj)}, where (tij) takes all 
possible values. We denote this cr-algebra by cr{B). 

We will define a log-linear model by a set of product partitions, called the generators of 
the model. Of course, a similar definition is possible also with generator partitions which 
are not of product form. For the spanned subspace, we use the notation Span(-). 

Definition 3. Let Bi, . . . ,Bs be product partitions of the chessboard, and use the simpli- 
fying notation J7^* = W. We say that p belongs to the log-linear model generated by these 
partitions if 

s 

logp(7r)=5^e^(|7rBj) TreSn, (20) 
1=1 
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where the 0* functions are arbitrary parameters. Equivalently, we require that 

logpG Span([/\...,C/^). (21) 
We will use the notation J~-{Bi, . . . , Bg) for this model. 

In the rest of this section, we give a sufficient condition, when the intersection of two 
log-linear models is itself a log-linear model, with directly identifyable generators. The 
proofs can be found in Section [7l The first lemma describes the relationship between 
conditional independence and orthogonal intersection. 

Lemma 1. Let {il.,A,P) be a probability space, and denote by L2{A) the Hilbert space of 
square-integrable random variables on it. For a a -algebra V <Z A, denote by L2{V) the 
closed linear subspace of L2{A) consisting of all D -measurable random variables. Let Di, 
1^2 C A. Then L2{'Di) _Ln -^^2(^2) if and only ifDi and 7^2 are conditionally independent, 
given Di n D2 • 

There is a partial ordering on the set of partitions. Partition Z = {Zi, . . . , Zg) is finer 
that W = {Wi, . . . , Wt) (or W is coarser than Z) if for every i there exists a j such that 
Zi C Wj. Denote this by Z >- W. Clearly, this implies D . By the application of 
Lemma [H we get 

Lemma 2. Let TZ' y TZ and C >- C be partitions of {1 : n}. Then we have 

^nxc ^7^'xc ^T^xc n jjTi'xc ^ ^nxc_ (22) 

The next two lemmas formulate simple facts from linear algebra, which will be needed 
in the sequel. We write U = Ui (B U2 for orthogonal decomposition, that is when U = 
Span(C/i, C/2) and Ui and U2 are orthogonal. 

Lemma 3. Suppose that U = Span{Ui : i ^ I), V = SpaniVj : j €z J) are two subspaces, 
and Ui J-n Vj for every pair i,j. Then U _Ln V, and U CiV = Span{Ui CiVj : i G I,j £ .J). 

Lemma 4. Let U = Ui (B U2 and V = Vi®V2 be two subspaces with orthogonal decompo- 
sitions. If U J-n V, Ui J-n Vi, U _Ln Vi, and Ui _Ln V hold, then U2 J-n ^2 is also true, 
and 

unv = {UiDVi)® {Ui n V2) e {U2 n Vi) e iU2 n V2). 

As a direct corollary of Lemma [2] and Lemma [3l we obtain 

Corollary 1. Let CiJZi x C : i = 1, . . . , s) and CiJZ x Cj : j = 1, . . . ,t) be two log-linear 
models, and suppose that IZ >- IZi and C >- Cj for all 1 < i < s, 1 < j < t. Then the 
intersection of the two models is the log-linear model CiJZi xCj : i = 1, . . . ,s, j = 1, . . . ,t). 
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3.2 Decomposability as a log-linear model 

In this section, we prove Theorem [H Recall from ^ the definition of a consecutive partition 
of {1 : n}. A consecutive partition, which contains only two neighboring sections is called 
a bold section. That is, the fcth bold section, containing the sections k — 1 and k, is given 
by 

^k = {{l-k-l},{k},{k + l:n}), 2 < k < n - 1. (23) 

We will extend the notation <I>fc to /c = 1 and A; = n for the sake of convenience. The 
(consecutive) partition which partitions {1 : n} into n sets is called the full partition: 

^ = ({l},{2},...,{n}). (24) 

From the multiplicative form jH]), it is straightforward that the L-decomposable expo- 
nential family is the log-linear model 

p+ =£($;, X ^' : 1 < A; < n). (25) 

That is, the generators are the products of bold sections with the full partition. The reason 
is that the — 1 matrix Ivr^j^x"!-! is equivalent to the vector of unordered marginals {vr$^} 
defined in 

A submodel which we will use in the sequel has as generators coarser partitions. A 
consecutive partition, which contains only one section is called a thin section. Denote the 
fcth thin section by 

= {{1 : k},{k + l -.n}), l<k<n-l. (26) 

Notice that $i = $i and = $n-i- We extend the notation to k = n for the sake of 
convenience. The log-linear model 

V^^ = Ci^k x^:l<k<n-l) (27) 

is a submodel of the L-decomposable family, which consists of positive distributions p for 
which the conditional probability P(n(|C| -|- 1) = x | n{l : |C|} = C) depends on the pair 
(x, C) only through their union, CU{x}, where, as before, IT is a random permutation with 
distribution p. We will call these distributions Ls -decomposable, where S stands for "set", 
indicating that the choice of the kth element of the random permutation depends only 
on the set to be formed by the first k elements. We define L'g-decomposable distributions 
similarly. 

Recall from the definition of the subspace corresponding to a product partition, 
and for the sake of brevity introduce the notations 

[/$.x* ^ f^fe^ f^$,x* ^ ^fc_ ^28) 
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Prom the partial ordering of partitions, we get that U'^ C U'^, denote the orthogonal 
complement of U'' in by F^. By the same argument, C U^~^^ also holds. This 
yields that 

Span(C/^ : 1 < A: < n) = Span(F'= : 1 < k < n, 

Since $i = $i, we get = {0}. In addition, as is the trivial partition, [/" = Span(l). 
Thus the subspace belonging to the L-decomposable log-linear model is 

F = Span(C/'= : 1 < A; < n) = Span(F'= ■.2<k<n,l). (29) 

In the next lemma, we show that the subspaces on the righthandside of ([29]l not only span 
F, but give an orthogonal decomposition. The proof is found in Section [7l 

Lemma 5. The subspaces F^ {'2 ^ k < n) are orthogonal to each other and to the vector 
1. 

The number of free parameters in the L-decomposable exponential family, which we 
denote by 6„, is the dimension of F minus one. Prom the orthogonal decomposition ([29]) 
of F it is immediate that 

bn = dim(F) -1 = J2 = E ( J - 1) = 2" W2 - 1) + 1, (30) 

k=2 k=2 ^ ^ 

where the dimension of F^ is easy to calculate. 

The decomposition (f29]l simplifies the calculations regarding the dimension of the bi- 
decomposable model as well. Interchanging the role of rows and columns, we see that the 
L'-decomposable loglinear family is 

p+ = £(^ X : 1 < A; < n). (31) 

Define the {k,i)th bold cross-section as 

Hke = ^fc X (32) 

where the component (row and column) partitions are bold sections defined in (f23l) . By 
Corollary [H the bi-decomposable log-linear model is given by 

= Cinki:l<k,i<n). (33) 

In order to find the dimension of H, define the subspaces V^,V^,G^ in the L'-decom- 
posable model just as we defined U^,U^,F^ in the L-decomposable model. Using the 
notation introduced in (fT9]) . 
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Applying Lemma [5l the subspace corresponding to the L'-decomposable model can be 
written as 

G = ©^=2G^©Span(l). (35) 

By Lemma [21 for any pair 

U G {^7^ : 2<k<n}, V e {V^, : 2<i<n}, 

U J-n V, since these subspaces correspond to product partitions, where in the [/-partitions, 
the column-partition is as fine as possible, and in the F-partitions, the row-partition is 
as fine as possible. By Lemma HJ we get _Ln for all k,£. By Lemma [3j the 
space H = F n G corresponding to bi-decomposable distributions has the orthogonal 
decomposition 

H = ®2<k/<n{F'' n G') © 1. (36) 

It remains to find the dimension and a basis of F^ nG^. For the time being, fix k and 
£. By Lemma ^ the subspace corresponding to the (fc, £)th bold cross-section TCkt is just 
U^nV^ . Observe that F^nG^ consists of exactly those vectors of the space U^nV\ which 
are orthogonal to both U'' and V^. Recall that the vrth coordinate of a vector in U'' n 
depends only on its marginal {Tr-Hkel — (^«i)i<ij<3 defined by (fTSj) . that is the number 
of rooks TT places in the nine parts into which the (A;, ^)th bold cross-section divides the 
chessboard. 

As the nine elements of the matrix Ivr-^j^J must satisfy row-sum and column-sum con- 
straints, the vector is determined by its coordinates tij for i,j = 1,2. Furthermore, all of 
ti2,t2i,t22 can be either zero or one. We will specify the marginal |vr7.{j^J by two coordi- 
nates: a = til -|- ti2 + ^21 + ^22 and q, where q codes the placement of the rooks in the 
middle row and column of the 3x3 partition. Our coding is as follows. In the kth row and 
ith column, there is either one rook in the intersection of the row and the column (code 5), 
or there are two rooks, one on a horizontal, and one on a vertical arm of the cross. In this 
latter case, the two occupied arms point towards a plane-quarter, and we use the usual 
numbering of the plane-quarters as coding (one or two arms of the cross may be missing, 
but this does not cause any problems). That is, for fixed k,i, 



a'=^(7r) =\{i: l<i<k, 1< 7r(i) < £} |, 



(37) 



and 



-.kl 



1 


if 


TT{k) > i, 


vr" 


-^{£) < k 


2 


if 


TT{k) < i, 


vr" 


< k 


3 


if 


TT{k) < i, 


vr" 


> k 


4 


if 


TT{k) > i, 


vr" 


> k 


5 


if 


TT{k) =£. 







(3? 
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Since the vrth coordinate of a vector in U'' D depends only on its marginal Ittt^^J, a 
basis of n is given by the indicator vectors of all the possible values of this marginal. 
Therefore, for each a, q, we define this indicator vector 

P^^(vr)=x{a'='(vr)=a, q^'{^)=q}. (39) 



Of course, for many pairs a, q, these are zero vectors. We now determine all cases when 
p^q is not identically zero. First, we need 

max(0, A; + £-n) < a < min(/c,^), (40) 



as there must be a non-negative number of rooks in each rectangle of the board, q can 
usually be anything from 1 to 5, except 

a = ^ q = A, 

a = 1 =^ g G {1,3,4,5}, 

a = k<3 {2,3,5}, (41) 

a = j<k {1,2,5}, 
a = j = k q e {2,5}. 



We call the pairs a, q satisfying (|40]) and (1411) non-trivial pairs. After all this preparation, 
we are ready for the proof of Theorem [H 

Proof of Theorem\^ We have to show (fTSj) . The number of free parameters of the bi-de- 
composable log-linear model is dim(//) — 1, since only those vectors v m. H are allowed 
for which p = is a probability distribution. By ([36]) . we only need to determine the 
dimension of each subspace r\G^ , which consists of the vectors of r\V^ , which are 
orthogonal to both and . 

Let u = g CaqP^q be an arbitrary vector in HV^, we find when it is orthogonal to 
and V^. First take a vector t;(7r) = x{'^{^ : k} = C) in the basis of U^, and introduce 
the notation \ C H {1 : i} \= a. With h = {k — l)l{n — k)l, the scalar product is calculated 
as 

fCaiik - a)h + Ca2ia - l)h + Ca^h if ££C 
Ca3ah + Ca4{k — a)h if £^C 

Similarly, if t;(7r) = xi'^^^i^ : £} = D) is a basis vector of V^, \ D H {1 : k} \= a, and 
g = {£ - l)\{n - £)[, then 



{u,v) 



{u,v) 



Casif^ - a)g + Ca2ia - l)g + Ca5g if k£D 
Caiag + Cai{£ - a)g if k^D 



Thus F n G consists of the linear combinations of those vectors Yliq=i ^aqPaq fo'^ which 
the above four linear combinations of the coefficients Caq are zero. Of the four constraints on 
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the coefficients, only three are hnearly independent, so in most cases there are two hnearly 
independent solutions for the five coefficients. The cases a = 0, 1, min(/c, i) must be treated 
separately, it is readily seen that in the case a = 0, the only solution is zero, while in the 
cases a = 1, min(A;, £) there is one non-zero solution. Let A^^ denote the number of linearly 
independent solutions, that is A^^ is either zero, one or two. The following vectors form 
an orthogonal basis of n (with the exception that some vectors may be 0): 

f^ai= -Pa2 + (a - l)Pa5 

^^2= -{e-a)ap^A + {k-a){i-a)p^,i-{k-a)ap^^i+ (42) 

Finally, since 

^dim(F^nG^) = ^|{(A;,^) : dim(F'= n > «}|, 

k,e i>l 

to finish the proof of the theorem, it suffices to show that for 1 < i < n 

\{{k,£) ■.dim{F''nG^)>i}\ = [n-if. 

To this end, let us find those for which dim(F'^ n G^) > 2j + 2. This happens if 
among the quantities A^^ there are either two I's and at least j 2's, or one 1 and at least 
[j + 1) 2's. 

The first of these cases occurs when i + k < n + 1 and min{A;,^} > j + 2, while the 
second case occurs when £ + k > n + 2 and maxjfc, i} < n — j — 1. But if£ + A;<n + l and 
k,i > j + 2, then k,£ < n — j — 1 also holds. Similarly, if £+k > n + 2 and /c, ^ < n — j — 1, 
then at the same time A;,£>j + 3>i + 2. Therefore, dim(F'' n G^) > 2j + 2 holds if and 
only if J + 2 < /c, £ < n — J — 1, and there are [n — (2j + 2)]^ such pairs. 

Let us find those k,£^ for which dim(F'^ n G^) > 2j + 1. This happens if among the 
quantities A^^ there are either two I's and at least j 2's, or one 1 and at least j 2's. 

The first of these cases occurs when £ + k < n + 1 and min{A;,^} > j + 2, while the 
second case occurs when £ + k > n + 2 and max{A;, £} < n — j. But if£ + k<n + l and 
k,£ > j + 2, then k,£ < n — j — 1 < n — j also holds. Similarly, li £ + k > n + 2 and 
k,£ < n — j , then at the same time k,£ > j + 2. Therefore, dim(F'^ n G^) > 2j + 1 holds 
if and only ii j + 2 < k, £ < n — j , and there are [n — (2j + 1)]^ such pairs. □ 

Remark 1. In ([42]) . we found an orthogonal basis of the space H. This orthog- 

onality is convenient for finding the parameters corresponding to a bi-decomposable dis- 
tribution. There exists a basis consisting of indicator vectors as well, as follows. Denote 
for any k,£,a v^^ = Ylq=i Paq^ where p^^ was defined in ([391) . That is, u^^ is the indicator 
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vector of the event that there are exactly a rooks in the upper left k x i rectangle of the 
chessboard. The following vectors, together with 1, form a basis of H: 

v^J- : 1 < A;,^ < n- l,max(0,A; + ^-n) < a < min(/i:,£), ^^^^ 
/?^5 : \ <k,l <n — l,max(l, k + i — n) < a < m.\n{k,£). 

This statement can be proved by induction, we omit the somewhat lengthy calculations. 

Remark 2. Notice that the vectors v^^, together with 1 form the basis of the subspace 
associated with those positive distributions, called bis -decomposable distributions, which 
belong to the intersection of the L5- and L^-decomposable models. By Corollary [U this 
is again a log-linear model, with generating product partitions 

Hke = X (44) 

which we call the thin {k,i)th cross-sections, dividing the chessboard into four rectangles. 
The component partitions were defined in ([26]) . Notice that in this case, with k, i fixed, 
(a takes on all its possible values) is an orthogonal basis of the subspace corresponding to 
Tiki- These subspaces are "almost linearly independent" in the sence that the only linear 
dependence is that they all contain the vector 1. Prom this it follows that the number of 
parameters in this model is 

L(n-1)/2J 

en= Yl ('^-2j-l)'. (45) 

The vectors /0^5 represent the "difference" between the bi-decomposable and the bi^-de- 
composable distributions. 

Remark 3. We have calculated the number of free parameters in the L-decomposable, bi- 
decomposable, bi^-decomposable models. For the sake of completeness we mention that 
the number of free parameters in the remaining L^-decomposable model is given by 

= 2" - n - 1. (46) 
3.3 Maximum likelihood estimation 

As we have seen in the previous sections, the positive bi-decomposable distributions 
on Sn form an exponential family with dn parameters. 

Denote by ni,...,!!^ a sample taken from a positive bi-decomposable distribution, 
and let r{-ir) stand for the relative frequency of the permutation tt in the sample. The 
maximum likelihood estimate of the true distribution, or equivalently, of its parameters, 
does not appear to have an explicit form in general, the likelihood function has to be 
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maximized by numerical methods. The iterative proportional fitting procedure (IPFP), 
used in the theory of log-linear models, is one option. This algorithm converges to the 
maximum likelihood estimate, if it exists. We describe briefly the implementation of this 
algorithm in our setting. 

The generators of the bi-decomposable log-linear model are the bold cross-sections TLki. 
in ([32]) . which define the marginal functions Ivr-^tJ given by (fTSll . The maximum likelihood 
estimate is a distribution p* G such that the distributions of the marginals \^'Hki I under 
p* are the same as under the empirical distribution r. There is at most one such p* in . 
In some cases, the maximum likelihood estimate does not exist, because no distribution in 
V'^ gives the same distribution of the marginals as the empirical distribution (we say that 
the sample contains structural zeros). In these suitable p* can only be found in 

the closure cl{V^). Numerical studies indicate that cl{V^) = Vh- 

The IPFP algorithm proceeds by cyclically fitting the distributions of the individual 
marginals 1117.^^^ J to that observed in the sample. It converges to the unique element in 
cl{V^) which agrees with the empirical distribution in all marginals. Starting from an 
arbitrary p^ G (say the uniform distribution), the nth iteration step calculates 

p'-^'i^) = ^^^''^"-'='^"-'y, p"(vr), (47) 
where the pair {k,l) runs cyclically over all possible values. 

Remark 4. By Remark [2l maximum likelihood estimation in the bis-decomposable model 
proceeds in an analogous way, namely by running the IPFP algorithm with K^^^J of (|44p 
instead of |vr->fj^J, i.e. we use the thin cross-sections instead of the bold ones. 
Remark 5. In Vl and in Vl', the maximum likelihood estimate can be given explicitly. 
For example, the L-decomposable model is parametrized by the conditional probabilities 
dll). The maximum likelihood estimate of these conditional probabilities is given by the 
corresponding conditional probabilities under the empirical distribution. 

Numerical studies indicate that the maximum likelihood estimate in the family cl{V'^) 
can also be obtained by iteratively calculating the maximum likelihood projections on the 
component spaces Vl and Vl'- 



4 Examples 

In this section, we collect some models from the literature, which are decomposable in at 
least one way. These models are submodels of the "free" decomposable models, since they 
place specific constraints on the parameters. 

Example 1 (Order statistics models). Consider an experiment in which people are asked 
to rank sounds according to their loudness, say in increasing order. One might suppose 
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that the actual perception of each stimulus is a random variable, whose relative ordering 
determines the person's ordering of the sounds. This example was studied by Thurstone 
|20| . and later by Daniels [5]. If the random variable associated to the ith sound is Xi with 
continuous distribution Fj, then the resulting distribution on the orderings is 

p(7r) = < ••• <X^(„)). 

The assumption that Xi are independent leads to the so called order statistics models. If 
the distributions form a location family Fj(x) = F{x — fn), the model is called Thurstone 
model. A well-studied case is when F{x) = 1 — exp(— expx) is the Gumbel distribution. 
This is called Luce model, and is equivalent to the model derived by Luce on the basis 
on his ranking postulate and Choice Axiom in p5|. This model is L-decomposable, with 
canonical decomposition 

A(rr,C)= , 

Z^y^C 

where 6y are arbitrary positive parameters (with sum equal to 1) associated with the 
objects. However, the model is not L'-decomposable for arbitrary 9. 

Example 2 (Paired comparisons models). A model suggested by Babington Smith [1] cre- 
ates the ordering of the n objects by making every possible paired comparison indepen- 
dently of each other. The result of such a tournament can be represented by a directed 
graph: if the graph contains no directed circle, then it corresponds to a unique ordering of 
the objects. Conditioning on the event that the graph is circle-free, we get 

i<j 

where 9xy is the probability that object x is preferred to object y in a paired comparison. 
This model is L-decomposable, with 

A(X,C) = HOyx. 

y&C 

However, the model is not L'-decomposable for arbitrary parameters. 

Example 3 (Mallows-Bradley- Terry model) . A special case of the paired comparison model 
is given by 6xy = ^ ■ This form of the paired comparison probabilities was suggested 
by Bradley and Terry , and Mallows [16] suggested using these probabilities in the paired 
comparison model. The resulting distribution on the orderings is given by 

n n 

p(vr) = c{a) n = c{a) J] af^'^'^ 
i=i j=i 

which is bi-decomposable. 
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Example 4 (Multistage ranking model). This model, investigated by Fligner and Verducci 
[TT] supposes the candidates are numbered from 1 to n. The ranking takes place stepwise. 
In the kth step, the best k — 1 ranks are already given out. The A;th best candidate is then 
chosen from the remaining ones, but only the relative order of the remaining candidates is 
taken into account. In particular, if the remaining candidates are ji < ■ ■ ■ < jn-k+i, then 
choose ji with probability 6{i, k), where 6{i, k) are parameters satisfying Yl^Zi~^^ ^(^> ^) = 

1. It is easily seen that this model is L-decomposable with 

K{x,c) = e{\cr^{l■.x}\^c\ + l). 

The model is also L'-decomposable. 

Example 5 (Repeated insertion model). This model, studied by Doignon, Pekec and Re- 
genwetter [9] assumes that the ordering is created by considering the candidates one after 
the other (according to their fixed numbering), and inserting the current candidate into 
the order already formed by the previous ones. More specifically, for each k, we have 
insertion probabilities 6{i^ k)^ i = 1, . . . ,k with sum 1. For the kth candidate, there are k 
possible places where he or she can be inserted into the order of the first k — 1 candidates: 
insert him or her between the (i — l)st and ith with probability 9{i,k). This model is 
a "dual" of the multistage ranking model, in the sense that it can be described similarly 
to it, by interchanging "ranks" and "candidates" (but not in the sense that the resulting 
permutations are each other's inverse). It is L-decomposable with 

A{x,C) =e{\Cn{l : x}\ + l,x), 

and it is also L'-decomposable. 

Example 6 (Quasi-independence log-linear model). Let Oij, 1 < i, j < n, be the elements 
of an arbitrary doubly stochastic matrix, and 

n n 

p w = c(0) n ^-w = n • 
i=i j=i 

This distribution is by the above equation bi-decomposable. Writing it in the log-linear 
form 

logp(vr) = + a« ) + + • • • + ajji^, (48) 

it states the quasi-independence of the variables vr(f), 1 < i < n. 

It is easy tho see that for n > 3 the random permutations belonging to the quasi- 
independence model have the following property. For any partition Z = {Z\^Z2) of size 

2, as in (fTBI) . with \Z\\ = 2, the ordered marginals ILzi and JIz2 are conditionally in- 
dependent, given the unordered marginals {H^}. This property is a generalization of 
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-L-decomposability, and we can prove that only the quasi-independent distributions pos- 
sess it. Moreover, for these distributions, a similar property is satisfied with arbitrary 
partitions Z. 

5 Invariance under relabellings 

As we have noted already, decomposability of a random pairing between sets A and B 
depends on how we label the elements of the two sets. Suppose that a labelling on both 
sets is fixed, and the random pairing function 11 : A — > S with these labellings becomes 
Horig : {1 : n} — > {1 : n}. Suppose that we relabel the set A according to the permutation 
cj G Sn-, that is object with original label i receives the new label oil). Similarly, relabel 
the set B according to p S S'n. Denote the random pairing function 11 : ^4 — > i? with these 
new labellings n„e«) : {1 : n} ^ {1 : n}. If with the original labelling, the pair of i G ^ is 
7r(z) G 5, then with the new labelling, the pair of o{i) G A is pvr(«) G B. Therefore, for 
the distributions "Porig and Pnew,, 

Porig{Tr) = PiJIorig = Tt) = P{Ilnew = pTTfT^^) = Pnew{pT^Cr~^) ■ (49) 

In this section we investigate whether L-decomposability is preserved after such relabellings 
or not. 

Definition 4. Let (j) : Sn ^ Sn he & one to one mapping. Then for any distribution p, 
define p,^(vr) = p(0(7r)). We say that the family V of distributions is invariant under (p, if 

Vcf> = {p^ ■■ P V} C V. 

Let us introduce some notation. For any a G Sm let 4> 
the right and left multiplications by a. Denote by c"(i2) the permutation which exchanges 
1 and 2 only, and by Ur the reversing permutation which maps k to n + 1 — k. 

In the ranking situation [ii] described in the Introduction, a model for a random ranking 
is called label-invariant, if it is invariant under relabellings of the objects. It is called 
reversible, if it is invariant under reversing of the ranks. The concepts of label-invariance 
and reversibility were studied for some wide classes of ranking models in p]. 

Theorem 2. Vl is invariant under left multiplications, and under the group of right 
multiplications generated by Ur and (7 {12) (for n > 4, this group contains eight right mul- 
tiplications, including the identity). The family is not invariant under any other right 
multiplication. 

Proof. Invariance under left-multiplications follows e.g. from Property 3 in Definition [H as 
well as invariance under right multiplication by 0"^, since the Markov property is reversible. 
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Invariance under right multiplication by 0'(i2) can be checked directly using Property 1 in 
Definition 1. 

To show that the family is not invariant under other right multiplications, we prove 
that for all other permutations a there exists a positive bi-decomposable distribution p, 
such that Poa is not L-decomposable. The group of right multiplications generated by 
4'ocrr}4'oa^i2) cire the multiplications by the following permutations: 

id, ar, Cr(i2), Crr<y{12), CrrO"(l2)<7r, Cr(l2)<7r, CT rO'(l2)0'rCr (12) , Cr(12) 0"rO'(12) (50) 

We will use the following property: let p be L-decomposable. Suppose that the probability 
of TTii and 7r22 is positive, and a is such that 7rii{l : o} = 7r22{l : a}- Define the "crossover" 
permutations: 

J vrii(A;) if k<a J 7r22(A:) if k<a 

I T^22\k) it k > a I 7rii(Kj II K > a 

Then 

pilTu) p{'K22)' 

If a not a member of the permutations in (jSOh . then neither is its inverse, and there 



exists an2<a<n — 2, such that 

cr"i{l : a} / {1 : a}, {n - a + 1 : n}. 
Let a be such a number. Therefore there exist c, e S {1 : a} and d, / {1 : a}, for which 

c* = a-^{c) > a-\d) = d*, e* = a'^e) < a'^f) = f*. 

For the numbers a, /3, 7 we say that a separates (3 and 7, if /3 < a < 7 or /3 > a > 7. 
Now, if d* > /*, then d* (and /* as well) separates c* and e*. If d* < f*, then either one 
of them separates c* and e*, or c* (and e* as well) separates d* and /*. Therefore, one of 
the following two cases holds: 

1. 3 c, e G {1 : a}, d ^ {1 : a} : d* separates c* , e* 

2. 3 c G {1 : a}, d, / {1 : a} : c* separates c?*,/* 

The two cases can be treated in the same way. Let us deal with the first one! Let / 
{1 : a}, / / d be arbitrary, with /* = a~^{f). Recall §9^1, and let p = c{d*) exp{pf,f} , 
this is a positive bi-decomposable distribution. Let vrn = a~^, from which we obtain 7r22 
by exchanging two pairs: 

7r22(c) = e*,7r22(e) = c*,7r22((i) = f*,ir22if) = d* . 
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Denote by ttu and 7r2i the crossover permutations. For these four permutations, poa does 
not satisfy (fSTI) . On the one hand, multiplying vrn by a from the right, we get the identity 
permutation, for which p'^lf = 1. On the other hand, for both 7ri2cr and 7r2i(T, p'^lf = 
, since for the first, d* is not a fixed point, and for the second, there is an element greater 
than d* among the first d* elements. This completes the proof. □ 

6 Discussion and application 

In this paper, we introduced log-linear models for random permutations, whose generators 
are product partitions of the chessboard. Examples are the L-decomposable, L^-decompo- 
sable, bi-decomposable, and big-decomposable models. In all of these cases, we determined 
the number of parameters. We showed how to calculate the maximum likelihood estimate 
of the continuous parameters either directly (for the L-decomposable model) or by the it- 
erative proportional fitting algorithm (in the other models). The natural order (s) implied 
by the models on the set(s) is either known, or it also has to be estimated. We studied the 
extent to which this order can be determined in the L-decomposable and the bi-decom- 
posable models. There are many other statistical questions of interest, which we did not 
address in this paper. Another theoretically, and perhaps also practically important ques- 
tion is the characterization of decomposable distributions, if we do not restrict ourselves 
to the strictly positive case. 

Finally, we fit our models to one of the most investigated ranking data in the literature, 
the 1980 election of the American Psychological Association (APA). This organization 
elects a president each year by asking its members to rank five candidates. In 1980, 5738 
complete rankings were cast. APA chooses the winner by the Hare system. See Fishburn 
[To] for a review of the advantages and disadvantages of this system. 

Analyses of these data can be found, among others, in Chung and Marden [3], Diaco- 
nis [7], McCullagh [18], and Stern [19]. One characteristic feature of the data is that the 
members of the association can be divided into three distinct groups: the research psychol- 
ogists (candidates 1 and 3 belong here), clinical psychologists (their candidates are 4 and 
5), and the community psychologists, to whom 2 belongs. The first two groups represent 
the majority of the members. Not surprisingly, analysis shows that each group tends to 
prefer its own candidates. 

Chung and Marden [3] fit orthogonal contrast models to the data. Diaconis [7] uses this 
dataset to illustrate the method of spectral analysis of ranked data, with many pointers 
to literature. McCullagh [18] fits log-linear models based on inversions to the data. We 
emphasize here that we do not attempt to provide a thorough analysis of the APA data, 
our aim is merely to illustrate the fit of our models on a real dataset. We used the ordering 
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data, Table [T] shows the maximum of the log-likelihood function (L) and the chi-square 
value of the goodness of fit, with the degrees of freedom in parentheses for all models. In the 
last column, we gave the standardized statistic U = {GOF — df)/^/df. In all cases, where 
applicable, the results appearing in Table [T] correspond to the best right/left relabelling 
of the original data, which we found by an exhaustive search. As expected, the order on 
the ranks indicated by the models coincides with the natural order from best to worst. All 
models agree on the natural order of the candidates as well, 4 is in the middle, with {1, 3} 
and {2, 5} on its two sides (notice that there are eight permutations fitting this pattern, 
as stated in Theorem [21). The best fit is provided by the L-decomposable model. The 
results indicate that the rankings violate decomposability (i.e. conditional independence 
relations) more than the orderings. There are at least two ways to find better fitting models. 
Firstly, we could check which conditional independences do not hold. Models which assume 
decomposability at only some row-sections and some column-sections also fit into the log- 
linear setting described in this paper. While a general expression for the number of free 
parameters in these wider models is probably intractable, it can be calculated numerically 
in any particular case. Secondly, one could try to reduce the number of parameters by 
selecting a log-linear model with fewer generating partitions. 

Table 1: Fit of log-linear models to APA ordering data 



Model 


L 


GOF (df) 


U 


saturated 


-26612 






L-decomposable 


-26661 


98.9(70) 


3.45 


L'-decomposable 


-26674 


126.5(70) 


6.75 


L5-decomposable 


-26684 


144.8 (93) 


5.37 


L'^-decomposable 


-26697 


171.7(93) 


8.16 


bi-decomposable 


-26687 


151.8(89) 


6.66 


big-decomposable 


-26701 


180.1(99) 


8.15 


uniform 


-27470 


2183.0(119) 





7 Proofs 

Proof of Proposition O If the stated conditional independences hold, the random permu- 
tation is clearly bi-decomposable. For the other direction, suppose 11 is bi-decomposable. 
By L-decomposability, given {11^}, 11^. are conditionally independent. Conditioning on 
{n^^} as well does not ruin this independence, since the additional condition restricts the 
values of the 11^. one by one. Thus we proved that 11^^ are conditionally independent, 
given {n«} and {n-^}, and using L'-decomposability, the same is true for n_^^. Denote a 
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condition hy E = {{11^} = u, {H^ ^} = v}, then 

s 

p^U = 7t\E) =llP{U^^ = TT^JE), 
i=l 

and since 

n^, = (n«^xA^. : i<j<t), 

where Hk xx is a function of 117^, also 

t 

p(n,^ = %J^) = n = ^..xaJ£^), 

which proves the lemma. □ 

Proof of LemmaUi Since -^2(^1 H P2) = ^2(^1) H L2{V2), the spaces L2{Vi) es -^2(^2) 
intersect orthogonally if and only if for any / G ^2(2?!), 5 G L2(I'2), 

i5; ( [/ - I Pi n P2)] [5 - £^(5 I ^1 n V2)] ) = 0. 

If the conditional independence relation holds, then the following stronger equality 
holds: 

E{[f -E{f\Vir\ V2)] [g - E{g \ Vi n V2)] \ V, n V2) = 0. 

In the other direction, if the spaces intersect orthogonally, then let Ei £ Vi, E2 £ 'D2, 
and denote by C the event that 

p{Ei n ^2 1 Pi n V2) - P{Ei I Vi n V2)p{E2 \ Vi n V2) > 0. 

With / = x{Ei)x{C) and g = x{E2)x{C), 

E{[f-E{f\ VinV2)] [g - E{g \ Vi n V2)]) = 

E {E{fg I Vi n V2) - E{f I Vi n V2)E{g \ Vi n V2)) = 

= E {x{C) [P{Ei n ^2 1 Pi n P2) - P{Ei I Pi n p2)P(^2 1 Pi n P2)] ) = 0. 

This is possible only if P{Ei n £^2 I Pi n P2) - P{Ei \ Pi n V2)P{E2 | Pi n P2) < with 
probability 1. The reverse inequality is obtained similarly, thus Ei and E2 are conditionally 
independent, given Pi nP2. □ 

Proof of Lemma We apply Lemma [T] to Sn endowed with the uniform distribution. In 
this case, orthogonal intersection in the -L2-space is equivalent to orhtogonal intersection 
in M"'. The second statement in ([22]) holds, because it is easy to check that a{TZ' x 
C) n aiJZ X C) = a{TZ x C). Concerning the first one, we have to prove that IHt^'xcI 
and IHt^xC'I are conditionally independent, given IHt^xcI, if H is a uniformly distributed 
random permutation, which is again easy to check. □ 
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Proof of LemmalM By supposition, Pry.Ui C Ui for every therefore Pry.U C U for 
every j, i.e. C/ intersects each Vj orthogonahy. Consequently, PrjjVj C Vj for every j, which 
yields PruV C F, which was to be proved. On the other hand, let W = Span(C/j n Vj : 
i £ I,j £ J). Then Prv^Ui C H^, furthermore PrjjVj = Pry^U C W, which leads to 
PruV dW. □ 

Proof of Lemma [7} We use that U and V intersect orthogonally if and only if the projection 
operators onto them commute, that is PruPry = PryPru. Now 

Pru.Pry, = (Pru - PruJ{Pry - PryJ = 

PruPry — Pru^Pry — PruPry^ + Pru^Pry^, 

and by supposition the operators in all four terms commute, yielding Pry^Pru^- The 
second statement follows from Lemma [3l □ 

Proof of Lemma O We use again that orthogonality in the L2-space is equivalent to orhtog- 
onality in M"'. In this proof, we use the notation 

An element of F'^ is a difference /i = f — E{f \ ak), where / is cjfc-measurable. Orthogonal- 
ity to 1 means that E{fi) = 0. For the other statement, let gi be an element of , where 
j > k. It is easy to check that under the uniform distribution, ak and aj are conditionally 
independent, given ak- Therefore, 

E{fm) = E[E{fm I ^k)] = E[E{fi I ak)E{gi \ ak)] = 0, 

since E{fi \ ak) = with probability 1. □ 
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